Enriching an Authority File of Scientific Conferences with Information Extracted from the Web

نویسندگان

  • Heider Alvarenga de Jesus
  • Denilson Alves Pereira
چکیده

Corresponding Author: Denilson Alves Pereira Department of Computer Science, Universidade Federal de Lavras, PO Box 3037, 37.200-000, Lavras, Brazil Email: [email protected] Abstract: Authority files maintain variant forms to refer to the same entity and they are very useful in digital libraries. However, collect data and keep an updated authority file is not a trivial task. This paper proposes an approach for the enrichment of a publication venue authority file by extracting information on conferences from their web pages. Collecting additional data is important to improve the effectiveness of data disambiguation tools and information retrieval, such as those that measure the quality of a scientific publication based on bibliometrics (e.g., Journal Impact Factor). Most applications use only basic citation metadata, such as author's names, work and publication venue titles. However, data external to the publication, contained in the publication venue web page, can be very useful in the disambiguation task. Our approach includes the steps for querying a web search engine, classifying documents obtained in the result sets and extracting information from the relevant pages. We evaluated two methods for classifying documents, one based on genre and content and one based on content only. The experiments show good results to trace a history of conference editions, with data such as URL, year of each edition and dates of changing in their names.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Role of Scientific Authority in the Development Process in Iran: A Systematic Review

Objectives: Scientific authority which means others’ continuous referral to an individual or organization and being recognized as a theory-maker, leads to develop a society socially, economically and scientifically. The goal of this study was to explain the role of scientific authority in the development process of the country based on the conducted studies. Method: This study was conducted in...

متن کامل

دروازه اطلاعات علمی،‌پژوهشی، و فناورانه ایران: خدمتی نوین برای پژوهشگران ایرانی

Information Subject Gateways are providing access to the necessary quality controlled databases among the vast resources for users of the web and saving them from the confusion and perplexity among the sources on the web. The main objective of this research is creating Iranian Gateway for Scientific, Research, and Technological Information as a valuable source for use by academics and researche...

متن کامل

چهار دهه فعالیت علمی ایران از منظر مقالات همایش‌ها، مقالات پر استناد و داغ و مقالات دسترسی آزاد با نگاهی به قانون برنامه توسعه اقتصادی ، اجتماعی، فرهنگی کشور

This study aims to investigate Iran scientific production Pre-revolutionary by 2016 with the emphasis on the conferences proceedings, highly cited and hot papers, and open access papers, in the light of the Law of Economic, Social, and Cultural Development Plan of Iran. Descriptive – analytical method used. To achieve research objectives data extracted from Clarivate Analytics (Thomson Reuters)...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCS

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2017